wip(agents-api): Add Doc sql queries #979

Vedantsahai18 · 2024-12-20T01:24:53Z

PR Type

Enhancement

Description

Added comprehensive document management system with the following features:

Core document operations:
- Create, read, update, and delete (CRUD) operations
- Owner association and validation
- Metadata support
- Error handling
Advanced search capabilities:
- Full-text search with ranking
- Vector-based embedding search
- Hybrid search combining text and embeddings
- Maximal Marginal Relevance (MMR) for result diversity
List and filtering:
- Paginated document listing
- Sorting options (created_at, updated_at)
- Owner-based filtering
- Support for user, agent, and org ownership types
Performance optimizations:
- Parallel search execution
- Optional simsimd support for faster similarity calculations
- Efficient SQL queries with proper indexing

Changes walkthrough 📝

Relevant files

Enhancement

9 files

__init__.py `Initialize document management module with core operations` agents-api/agents_api/queries/docs/init.py Added new module for document management operations Defined core functionalities for document CRUD operations Imported all document-related query functions	+25/-0
create_doc.py `Document creation with ownership management` agents-api/agents_api/queries/docs/create_doc.py Implemented document creation with metadata support Added owner association functionality Included error handling for unique violations and foreign key constraints	+135/-0
delete_doc.py `Document deletion with ownership validation` agents-api/agents_api/queries/docs/delete_doc.py Added document deletion functionality Implemented ownership validation Added cascade deletion for doc_owners	+77/-0
get_doc.py `Single document retrieval functionality` agents-api/agents_api/queries/docs/get_doc.py Implemented single document retrieval Added owner-based filtering	+52/-0
list_docs.py `Paginated document listing with filters` agents-api/agents_api/queries/docs/list_docs.py Added paginated document listing Implemented sorting and filtering options Added owner-based filtering	+91/-0
mmr.py `Maximal Marginal Relevance implementation for search` agents-api/agents_api/queries/docs/mmr.py Implemented Maximal Marginal Relevance algorithm Added cosine similarity calculation Optimized with simsimd support	+109/-0
search_docs_by_embedding.py `Vector-based document search implementation` agents-api/agents_api/queries/docs/search_docs_by_embedding.py Added vector-based document search Implemented embedding similarity search Added owner filtering support	+70/-0
search_docs_by_text.py `Full-text document search implementation` agents-api/agents_api/queries/docs/search_docs_by_text.py Implemented full-text search functionality Added text ranking support Included owner filtering	+65/-0
search_docs_hybrid.py `Hybrid document search with score fusion` agents-api/agents_api/queries/docs/search_docs_hybrid.py Implemented hybrid search combining text and embedding Added score fusion algorithm Implemented parallel search execution	+159/-0

💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

Important

Add comprehensive document management system with CRUD, advanced search, and performance optimizations.

Document Management:
- Added CRUD operations in create_doc.py, delete_doc.py, get_doc.py, and list_docs.py.
- Owner association and validation, metadata support, and error handling.
Search Capabilities:
- Full-text search in search_docs_by_text.py.
- Vector-based search in search_docs_by_embedding.py.
- Hybrid search in search_docs_hybrid.py.
- MMR algorithm in mmr.py.
Performance:
- Parallel search execution and simsimd support.
Models:
- Updated Doc model in Docs.py with new fields like modality, language, index, embedding_model, and embedding_dimensions.
Tests:
- Added tests in test_docs_queries.py for CRUD operations and listing.

^{This description was created by}^{for 249513d. It will automatically update as commits are pushed.}

qodo-merge-pro-for-open-source · 2024-12-20T01:25:42Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 Security concerns SQL Injection: The code uses parameterized queries which is good, but the `websearch_to_tsquery` function in search_docs_by_text.py could potentially be vulnerable to injection if the input is not properly sanitized before being passed to the function.
⚡ Recommended focus areas for review Performance Issue The hybrid search implementation loads all results into memory before fusion. For large result sets this could cause memory issues. Input Validation Missing validation for embedding dimensions and model compatibility. Should validate that embedding dimensions match the specified model. Error Handling The maximal marginal relevance implementation lacks proper error handling for edge cases like zero vectors or NaN values.

qodo-merge-pro-for-open-source · 2024-12-20T01:26:11Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Score
Possible issue	Add validation for vector embedding dimensions to prevent invalid configurations Add input validation for embedding_dimensions to ensure it's a positive integer when embedding_model is specified, preventing invalid vector dimensions. agents-api/agents_api/queries/docs/create_doc.py [122-123] data.embedding_model or "none", -data.embedding_dimensions or 0, +data.embedding_dimensions if data.embedding_model != "none" and data.embedding_dimensions > 0 else 0, Apply this suggestion Suggestion importance[1-10]: 8 Why: The suggestion adds crucial validation to prevent invalid vector dimensions when an embedding model is specified, which could cause serious issues in downstream vector operations.	8
	Validate diversity-relevance tradeoff parameter to prevent invalid search results Add input validation for lambda_mult parameter to ensure it's between 0 and 1, as values outside this range would produce invalid diversity-relevance tradeoffs. agents-api/agents_api/queries/docs/mmr.py [64-69] def maximal_marginal_relevance( query_embedding: np.ndarray, embedding_list: list, lambda_mult: float = 0.5, k: int = 4, ) -> list[int]: + if not 0 <= lambda_mult <= 1: + raise ValueError("lambda_mult must be between 0 and 1") Apply this suggestion Suggestion importance[1-10]: 7 Why: Adding validation for lambda_mult is important as values outside [0,1] would produce invalid diversity-relevance tradeoffs, potentially breaking the MMR algorithm's functionality.	7
	Validate hybrid search weight parameter to prevent invalid result ranking Add validation for alpha parameter to ensure it's between 0 and 1, as this weight controls the balance between text and embedding search results. agents-api/agents_api/queries/docs/search_docs_hybrid.py [107-114] async def search_docs_hybrid( developer_id: UUID, text_query: str = "", embedding: List[float] = None, k: int = 10, alpha: float = 0.5, +) -> List[Doc]: + if not 0 <= alpha <= 1: + raise ValueError("alpha must be between 0 and 1") Apply this suggestion Suggestion importance[1-10]: 7 Why: The validation ensures alpha stays within [0,1], preventing incorrect weighting between text and embedding search results that could lead to invalid rankings.	7

qodo-merge-pro-for-open-source · 2024-12-20T01:26:20Z

CI Failure Feedback 🧐

(Checks updated until commit `3600a92`)

Action: Typecheck
Failed stage: Typecheck [❌]
Failed test name: pytype
Failure summary: The pytype check failed due to multiple errors: 1. Import Error: Cannot find module `pycozo` in file `agents_api/common/utils/cozo.py` 2. Type Error: In file `agents_api/queries/docs/search_docs_hybrid.py`: - Attempting to access `model_copy` attribute on a None value - Type annotation mismatch for `embedding` variable 3. Type Comment Error: Stray type comments found in `tests/test_workflow_routes.py`
Relevant error logs: 1: ##[group]Operating System 2: Ubuntu ... 1194: [17/369] check agents_api.autogen.Files 1195: [18/369] check agents_api.autogen.Executions 1196: [19/369] check agents_api.autogen.Entries 1197: [20/369] check agents_api.autogen.Agents 1198: [21/369] check agents_api.activities.sync_items_remote 1199: [22/369] check agents_api.clients.__init__ 1200: [23/369] check agents_api.common.utils.datetime 1201: [24/369] check agents_api.common.utils.cozo 1202: FAILED: /home/runner/work/julep/julep/agents-api/.pytype/pyi/agents_api/common/utils/cozo.pyi 1203: /home/runner/work/julep/julep/agents-api/.venv/bin/python -m pytype.main --disable pyi-error --imports_info /home/runner/work/julep/julep/agents-api/.pytype/imports/agents_api.common.utils.cozo.imports --module-name agents_api.common.utils.cozo --platform linux -V 3.12 -o /home/runner/work/julep/julep/agents-api/.pytype/pyi/agents_api/common/utils/cozo.pyi --analyze-annotated --nofail --none-is-not-bool --quick --strict-none-binding /home/runner/work/julep/julep/agents-api/agents_api/common/utils/cozo.py 1204: /home/runner/work/julep/julep/agents-api/agents_api/common/utils/cozo.py:9:1: error: in <module>: Can't find module 'pycozo'. [import-error] 1205: from pycozo import Client 1206: ~~~~~~~~~~~~~~~~~~~~~~~~~ 1207: For more details, see https://google.github.io/pytype/errors.html#import-error ... 1213: [30/369] check agents_api.app 1214: [31/369] check agents_api.metrics.counters 1215: [32/369] check agents_api.common.utils.types 1216: [33/369] check agents_api.common.nlp 1217: [34/369] check agents_api.common.storage_handler 1218: [35/369] check agents_api.common.protocol.developers 1219: [36/369] check agents_api.dependencies.exceptions 1220: [37/369] check agents_api.queries.utils 1221: ERROR:pytype.matcher Invalid type: <class 'pytype.abstract.function.ParamSpecMatch'> ... 1322: [138/369] check tests.test_task_queries 1323: [139/369] check tests.test_user_routes 1324: [140/369] check agents_api.routers.internal.__init__ 1325: [141/369] check agents_api.worker.__init__ 1326: [142/369] check agents_api.queries.files.__init__ 1327: [143/369] check tests.test_agent_routes 1328: [144/369] check agents_api.common.exceptions.users 1329: [145/369] check tests.test_workflow_routes 1330: /home/runner/work/julep/julep/agents-api/tests/test_workflow_routes.py:65:1: error: : Stray type comment: object [ignored-type-comment] 1331: # type: object~~~~~~~~~~~~~~~ 1332: # type: object 1333: /home/runner/work/julep/julep/agents-api/tests/test_workflow_routes.py:114:1: error: : Stray type comment: object [ignored-type-comment] 1334: # type: object~~~~~~~~~~~~~~~ 1335: # type: object 1336: For more details, see https://google.github.io/pytype/errors.html#ignored-type-comment 1337: [146/369] check agents_api.queries.developers.__init__ 1338: [147/369] check agents_api.dependencies.__init__ 1339: [148/369] check agents_api.rec_sum.entities 1340: [149/369] check agents_api.metrics.__init__ 1341: [150/369] check agents_api.queries.users.__init__ 1342: [151/369] check agents_api.rec_sum.__init__ 1343: [152/369] check agents_api.queries.docs.search_docs_hybrid 1344: FAILED: /home/runner/work/julep/julep/agents-api/.pytype/pyi/agents_api/queries/docs/search_docs_hybrid.pyi 1345: /home/runner/work/julep/julep/agents-api/.venv/bin/python -m pytype.main --disable pyi-error --imports_info /home/runner/work/julep/julep/agents-api/.pytype/imports/agents_api.queries.docs.search_docs_hybrid.imports --module-name agents_api.queries.docs.search_docs_hybrid --platform linux -V 3.12 -o /home/runner/work/julep/julep/agents-api/.pytype/pyi/agents_api/queries/docs/search_docs_hybrid.pyi --analyze-annotated --nofail --none-is-not-bool --quick --strict-none-binding /home/runner/work/julep/julep/agents-api/agents_api/queries/docs/search_docs_hybrid.py 1346: /home/runner/work/julep/julep/agents-api/agents_api/queries/docs/search_docs_hybrid.py:98:15: error: in fuse_results: No attribute 'model_copy' on None [attribute-error] 1347: In Optional[agents_api.autogen.Docs.Doc] 1348: doc = doc.model_copy() # or copy if you are using Pydantic 1349: ~~~~~~~~~~~~~~ 1350: /home/runner/work/julep/julep/agents-api/agents_api/queries/docs/search_docs_hybrid.py:105:1: error: in <module>: Type annotation for embedding does not match type of assignment [annotation-type-mismatch] ... 1445: # fuse them 1446: ~~~~~~~~~~~~~~~ 1447: fused = fuse_results(text_results, embed_results, alpha) 1448: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1449: # Then pick top K overall 1450: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1451: return fused[:k] 1452: ~~~~~~~~~~~~~~~~~~~~ 1453: For more details, see https://google.github.io/pytype/errors.html ... 1463: [162/369] check tests.__init__ 1464: [163/369] check agents_api.queries.docs.__init__ 1465: [164/369] check agents_api.rec_sum.trim 1466: [165/369] check agents_api.common.exceptions.agents 1467: [166/369] check agents_api.queries.docs.mmr 1468: [167/369] check tests.test_messages_truncation 1469: [168/369] check agents_api.rec_sum.summarize 1470: [169/369] check agents_api.clients.worker.worker 1471: ninja: build stopped: cannot make progress due to previous errors. 1472: Computing dependencies 1473: Generated API key since not set in the environment: 60910645726996193694483681243020 1474: Sentry DSN not found. Sentry will not be enabled. 1475: Analyzing 341 sources with 0 local dependencies 1476: Leaving directory '.pytype' 1477: ##[error]Process completed with exit code 1.

✨ CI feedback usage guide:

The CI feedback tool (/checks) automatically triggers when a PR has a failed check.
The tool analyzes the failed checks and provides several feedbacks:

Failed stage
Failed test name
Failure summary
Relevant error logs

In addition to being automatically triggered, the tool can also be invoked manually by commenting on a PR:

/checks "https://github.com/{repo_name}/actions/runs/{run_number}/job/{job_number}"

where {repo_name} is the name of the repository, {run_number} is the run number of the failed check, and {job_number} is the job number of the failed check.

Configuration options

enable_auto_checks_feedback - if set to true, the tool will automatically provide feedback when a check is failed. Default is true.
excluded_checks_list - a list of checks to exclude from the feedback, for example: ["check1", "check2"]. Default is an empty list.
enable_help_text - if set to true, the tool will provide a help message with the feedback. Default is true.
persistent_comment - if set to true, the tool will overwrite a previous checks comment with the new feedback. Default is true.
final_update_message - if persistent_comment is true and updating a previous checks message, the tool will also create a new message: "Persistent checks updated to latest commit". Default is true.

See more information about the checks tool in the docs.

ellipsis-dev

❌ Changes requested. Reviewed everything up to 6c77490 in 1 minute and 30 seconds

More details

Looked at 848 lines of code in 10 files
Skipped 0 files when reviewing.
Skipped posting 8 drafted comments based on config settings.

1. agents-api/agents_api/queries/docs/search_docs_hybrid.py:13

Draft comment:
The import statement for run_concurrently is unnecessary since it's not used in the code. Consider removing it to clean up the imports.
Reason this comment was not posted:
Confidence changes required: 10%
The import statement for run_concurrently is unnecessary since it's not used in the code.

2. agents-api/agents_api/queries/docs/search_docs_hybrid.py:134

Draft comment:
Appending an empty list to tasks is incorrect. You should append a coroutine or task. Consider using None or a similar placeholder if you intend to skip adding a task.
Reason this comment was not posted:
Confidence changes required: 50%
The code uses tasks.append([]) which is incorrect for appending tasks. It should append a coroutine instead.

3. agents-api/agents_api/queries/docs/search_docs_hybrid.py:147

Draft comment:
Appending an empty list to tasks is incorrect. You should append a coroutine or task. Consider using None or a similar placeholder if you intend to skip adding a task.
Reason this comment was not posted:
Confidence changes required: 50%
The code uses tasks.append([]) which is incorrect for appending tasks. It should append a coroutine instead.

4. agents-api/agents_api/queries/docs/search_docs_hybrid.py:154

Draft comment:
Ensure that at least one valid task is added to tasks before calling gather. If both text_query and embedding are empty, tasks will contain only empty lists, leading to an error.
Reason this comment was not posted:
Confidence changes required: 50%
The code does not handle the case where both text_query and embedding are empty, which could lead to an error when calling gather.

5. agents-api/agents_api/queries/docs/search_docs_hybrid.py:110

Draft comment:
The embedding parameter should have a default value of an empty list [] instead of None to avoid type issues and simplify checks.
Reason this comment was not posted:
Confidence changes required: 50%
The embedding parameter should have a default value of an empty list instead of None to avoid type issues.

6. agents-api/agents_api/queries/docs/create_doc.py:69

Draft comment:
Consider adding more exception handling for other potential database errors, such as asyncpg.DataError or asyncpg.SyntaxOrAccessError, to make the function more robust.
Reason this comment was not posted:
Confidence changes required: 50%
The rewrap_exceptions decorator is used to handle exceptions, but it doesn't cover all possible exceptions that might occur during database operations.

7. agents-api/agents_api/queries/docs/delete_doc.py:43

Draft comment:
Consider adding more exception handling for other potential database errors, such as asyncpg.DataError or asyncpg.SyntaxOrAccessError, to make the function more robust.
Reason this comment was not posted:
Confidence changes required: 50%
The rewrap_exceptions decorator is used to handle exceptions, but it doesn't cover all possible exceptions that might occur during database operations.

8. agents-api/agents_api/queries/docs/get_doc.py:29

Draft comment:
Consider adding exception handling for potential database errors, such as asyncpg.DataError or asyncpg.SyntaxOrAccessError, to make the function more robust.
Reason this comment was not posted:
Confidence changes required: 50%
The rewrap_exceptions decorator is used to handle exceptions, but it doesn't cover all possible exceptions that might occur during database operations.

Workflow ID: wflow_0SgPnESfFL0Scwfr

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev · 2024-12-20T01:26:31Z

agents-api/agents_api/queries/docs/mmr.py

+        return similarity
+
+
+def maximal_marginal_relevance(


This is an exact duplicate of the existing implementation.

function maximal_marginal_relevance (mmr.py)

ellipsis-dev

👍 Looks good to me! Incremental review on 93673b7 in 1 minute and 3 seconds

More details

Looked at 1195 lines of code in 21 files
Skipped 0 files when reviewing.
Skipped posting 5 drafted comments based on config settings.

1. agents-api/agents_api/queries/docs/get_doc.py:5

Draft comment:
The import statement for 'asyncpg' is unnecessary and can be removed to clean up the code. This is also applicable in other files like 'list_docs.py', 'search_docs_by_embedding.py', and 'search_docs_by_text.py'.
Reason this comment was not posted:
Confidence changes required: 50%
The import statement for 'asyncpg' is unnecessary in multiple files as it is not used anywhere in the code. Removing it will clean up the code.

2. agents-api/agents_api/queries/docs/list_docs.py:5

Draft comment:
The import statement for 'asyncpg' is unnecessary and can be removed to clean up the code. This is also applicable in other files like 'search_docs_by_embedding.py', and 'search_docs_by_text.py'.
Reason this comment was not posted:
Confidence changes required: 50%
The import statement for 'asyncpg' is unnecessary in multiple files as it is not used anywhere in the code. Removing it will clean up the code.

3. agents-api/agents_api/queries/docs/search_docs_by_embedding.py:6

Draft comment:
The import statement for 'asyncpg' is unnecessary and can be removed to clean up the code. This is also applicable in 'search_docs_by_text.py'.
Reason this comment was not posted:
Confidence changes required: 50%
The import statement for 'asyncpg' is unnecessary in multiple files as it is not used anywhere in the code. Removing it will clean up the code.

4. agents-api/agents_api/queries/docs/search_docs_by_text.py:6

Draft comment:
The import statement for 'asyncpg' is unnecessary and can be removed to clean up the code.
Reason this comment was not posted:
Confidence changes required: 50%
The import statement for 'asyncpg' is unnecessary in multiple files as it is not used anywhere in the code. Removing it will clean up the code.

5. agents-api/agents_api/queries/docs/create_doc.py:98

Draft comment:
The 'org' option is removed from the owner_type Literal, but this change is not reflected in the PR description. This should be documented for clarity. This change is also applicable in 'delete_doc.py', 'get_doc.py', and 'list_docs.py'.
Reason this comment was not posted:
Confidence changes required: 50%
The 'org' option is removed from the owner_type Literal in multiple files, but this change is not reflected in the PR description. This should be documented for clarity.

Workflow ID: wflow_pnbwiXEmTCJVnbrw

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

❌ Changes requested. Incremental review on dc0ec36 in 48 seconds

More details

Looked at 939 lines of code in 28 files
Skipped 0 files when reviewing.
Skipped posting 1 drafted comments based on config settings.

1. agents-api/agents_api/queries/users/delete_user.py:59

Draft comment:
The asyncpg.exceptions.UniqueViolationError should not be handled here as it is not relevant to delete operations. Consider removing this exception handling.
Reason this comment was not posted:
Marked as duplicate.

Workflow ID: wflow_WUrKdAU41fxw9Cd0

Want Ellipsis to fix these issues? Tag @ellipsis-dev in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev · 2024-12-20T20:26:49Z

agents-api/agents_api/queries/agents/delete_agent.py

+            status_code=404,
+            detail="The specified developer does not exist.",
+        ),
+        asyncpg.exceptions.UniqueViolationError: partialclass(


The asyncpg.exceptions.UniqueViolationError should not be handled here as it is not relevant to delete operations. Consider removing this exception handling.

ellipsis-dev

👍 Looks good to me! Incremental review on 831e950 in 51 seconds

More details

Looked at 440 lines of code in 11 files
Skipped 0 files when reviewing.
Skipped posting 3 drafted comments based on config settings.

1. agents-api/agents_api/queries/docs/get_doc.py:36

Draft comment:
Using ast.literal_eval on d["content"] can be unsafe if the content is not guaranteed to be a valid Python literal. Consider using a safer method to parse or handle the content.
Reason this comment was not posted:
Comment was on unchanged code.

2. agents-api/agents_api/queries/docs/embed_snippets.py:10

Draft comment:
vectorizer_query is set to None. This is a placeholder and should be replaced with an actual query before deployment.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment is essentially repeating information that's already explicitly stated in a TODO comment one line above. The TODO comment is more visible and serves the same purpose. This makes the PR comment redundant and not adding any new information or value.
Perhaps the comment is trying to emphasize the importance of not deploying with a None query, which could be a critical issue.
While deployment concerns are valid, the existing TODO comment already makes it clear this needs to be replaced, and deployment issues would be caught by basic testing since the function would fail immediately.
Delete the comment as it's redundant with the existing TODO comment and doesn't provide additional actionable value.

3. agents-api/agents_api/queries/entries/list_entries.py:88

Draft comment:
Ensure sort_by and direction are validated and sanitized to prevent SQL injection, as they are used in string interpolation for SQL queries.
Reason this comment was not posted:
Comment did not seem useful.

Workflow ID: wflow_TPwdOI1YwvBHnQ9P

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

ellipsis-dev

👍 Looks good to me! Incremental review on 249513d in 1 minute and 4 seconds

More details

Looked at 1148 lines of code in 15 files
Skipped 0 files when reviewing.
Skipped posting 5 drafted comments based on config settings.

1. agents-api/agents_api/queries/docs/create_doc.py:153

Draft comment:
The function assumes data.content is always a list. Consider adding a check or conversion to handle cases where data.content might be a string or other type to prevent unexpected behavior.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment suggests adding a check that already exists in the code. The code already handles both list and non-list cases appropriately through the isinstance() check and separate logic paths. The comment appears to be incorrect in stating that the function "assumes data.content is always a list" when it clearly doesn't make that assumption.
Could there be other types besides list and non-list that need handling? Could the CreateDocRequest type definition enforce the content type making this check unnecessary?
The code's else branch handles any non-list type appropriately by treating it as a single content item. The type validation would be handled by FastAPI's request validation via CreateDocRequest if needed.
The comment should be deleted because it incorrectly suggests adding a check that already exists in the code. The code already properly handles both list and non-list content.

2. agents-api/agents_api/queries/docs/delete_doc.py:25

Draft comment:
The EXISTS clause in the SQL query might be redundant since the doc_owners entry is already deleted in the deleted_owners CTE. Consider revising the logic to ensure the docs entry is only deleted if the doc_owners entry existed prior to deletion.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable:
The comment raises an interesting point about the query logic - we delete from doc_owners first, then check if that same record existed in doc_owners before deleting from docs. However, this could actually be intentional behavior to ensure atomicity and proper ordering. Without deeper knowledge of the data model and requirements, we can't be certain this is actually a problem vs a deliberate safeguard.
I may be missing important context about transaction isolation levels and race conditions that could make this pattern necessary. The EXISTS check might serve as an important guard rail.
While the comment raises an interesting point about query structure, we don't have enough context to confidently say this is incorrect or needs to be changed.
This comment is speculative and requires more context about the data model and requirements to validate. Following our rules, we should err on the side of removing speculative comments.

3. agents-api/agents_api/queries/docs/list_docs.py:135

Draft comment:
The metadata_filter is directly appended to the query string, which could lead to SQL injection if not properly handled. Ensure that metadata keys and values are safely included in the query to prevent SQL injection vulnerabilities.
Reason this comment was not posted:
Comment did not seem useful.

4. agents-api/agents_api/queries/docs/search_docs_by_text.py:19

Draft comment:
The owner_types and owner_ids are passed as JSONB arrays, which might not be correctly handled by the SQL function. Ensure that these arrays are properly converted to UUID arrays to prevent unexpected behavior.
Reason this comment was not posted:
Comment did not seem useful.

5. agents-api/tests/test_docs_queries.py:11

Draft comment:
Consider adding tests for search_docs_by_embedding and search_docs_hybrid to ensure comprehensive coverage of the search functionalities.
Reason this comment was not posted:
Confidence changes required: 50%
The test_docs_queries.py file has a test for search_docs_by_text but it lacks tests for search_docs_by_embedding and search_docs_hybrid. Adding these tests would ensure comprehensive coverage of the search functionalities.

Workflow ID: wflow_qeLRxNLHNTGhklta

You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet mode, and more.

wip(agents-api): Doc queries

6c77490

qodo-merge-pro-for-open-source bot added the Review effort [1-5]: 4 label Dec 20, 2024

refactor: Lint agents-api (CI)

b427e38

ellipsis-dev bot reviewed Dec 20, 2024

View reviewed changes

Vedantsahai18 and others added 2 commits December 20, 2024 14:43

fix: fixed the CRD doc queries + added tests

93673b7

refactor: Lint agents-api (CI)

7b0be5c

ellipsis-dev bot reviewed Dec 20, 2024

View reviewed changes

Vedantsahai18 and others added 2 commits December 20, 2024 15:25

wip: initial set of exceptions added

dc0ec36

refactor: Lint agents-api (CI)

32d67bc

ellipsis-dev bot reviewed Dec 20, 2024

View reviewed changes

Vedantsahai18 and others added 2 commits December 20, 2024 16:40

chore: added embedding reading + doctrings updates

831e950

refactor: Lint agents-api (CI)

74add36

ellipsis-dev bot reviewed Dec 20, 2024

View reviewed changes

Vedantsahai18 and others added 2 commits December 21, 2024 03:12

chore: updated migrations + added indices support

249513d

refactor: Lint agents-api (CI)

d7d9cd4

ellipsis-dev bot reviewed Dec 21, 2024

View reviewed changes

Merge branch 'f/switch-to-pg' into f/doc-queries

3600a92

creatorrr marked this pull request as draft December 21, 2024 08:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wip(agents-api): Add Doc sql queries #979

wip(agents-api): Add Doc sql queries #979

Vedantsahai18 commented Dec 20, 2024 •

edited by ellipsis-dev bot

Loading

qodo-merge-pro-for-open-source bot commented Dec 20, 2024

qodo-merge-pro-for-open-source bot commented Dec 20, 2024 •

edited

Loading

qodo-merge-pro-for-open-source bot commented Dec 20, 2024 •

edited

Loading

Configuration options

ellipsis-dev bot left a comment

ellipsis-dev bot Dec 20, 2024

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

ellipsis-dev bot Dec 20, 2024

ellipsis-dev bot left a comment

ellipsis-dev bot left a comment

wip(agents-api): Add Doc sql queries #979

Are you sure you want to change the base?

wip(agents-api): Add Doc sql queries #979

Conversation

Vedantsahai18 commented Dec 20, 2024 • edited by ellipsis-dev bot Loading

PR Type

Description

Changes walkthrough 📝

qodo-merge-pro-for-open-source bot commented Dec 20, 2024

PR Reviewer Guide 🔍

qodo-merge-pro-for-open-source bot commented Dec 20, 2024 • edited Loading

PR Code Suggestions ✨

qodo-merge-pro-for-open-source bot commented Dec 20, 2024 • edited Loading

CI Failure Feedback 🧐

(Checks updated until commit 3600a92)

Configuration options

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot Dec 20, 2024

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot Dec 20, 2024

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

ellipsis-dev bot left a comment

Choose a reason for hiding this comment

Vedantsahai18 commented Dec 20, 2024 •

edited by ellipsis-dev bot

Loading

qodo-merge-pro-for-open-source bot commented Dec 20, 2024 •

edited

Loading

qodo-merge-pro-for-open-source bot commented Dec 20, 2024 •

edited

Loading

(Checks updated until commit `3600a92`)